Modification of UCT with Patterns in Monte-Carlo Go
نویسندگان
چکیده
Algorithm UCB1 for multi-armed bandit problem has already been extended to Algorithm UCT (Upper bound Confidence for Tree) which works for minimax tree search. We have developed a Monte-Carlo Go program, MoGo, which is the first computer Go program using UCT. We explain our modification of UCT for Go application and also the intelligent random simulation with patterns which has improved significantly the performance of MoGo. UCT combined with pruning techniques for large Go board is discussed, as well as parallelization of UCT. MoGo is now a top level Go program on 9× 9 and 13× 13 Go boards. Key-words: Computer Go, Exploration-exploitation, UCT, Monte-Carlo, Patterns ∗ projet TAO, INRIA-Futurs, LRI, Batiment 490, Université Paris-Sud 91405 ORSAY CEDEX, France † Centre de Mathématiques Appliquées, École Polytechnique, 91128 PALAISEAU CEDEX, France ‡ Centre de Mathématiques Appliquées, École Polytechnique, 91128 PALAISEAU CEDEX, France § projet TAO, INRIA-Futurs, LRI, Batiment 490, Université Paris-Sud 91405 ORSAY CEDEX, France in ria -0 01 17 26 6, v er si on 3 20 D ec 2 00 6 Modification d’UCT avec motifs dans le Monte-Carlo Go Résumé : L’algorithme UCB1 pour le problème du bandit-manchot a récemment été étendu en l’algorithme UCT (Upper bound Confidence for Tree) pour la recherche arborescente min-max. Nous avons développé un joueur artificiel de Go, MoGo, basé sur des simulations Monte-Carlo, qui est le premier programme de Go utilisant UCT. Nous exposons notre modification de l’algorithme UCT pour l’application au jeu de Go, ainsi que l’utilisation de motifs dans les simulations aléatoires qui ont permis d’augmenter significativement le niveau de MoGo. Nous introduisons d’autre part des techniques d’élagage dans l’algorithme UCT pour les grands Goban, ainsi que de la parallélisation d’UCT. MoGo est maintenant un joueur artificiel de premier plan sur les Gobans de taille 9× 9 et 13× 13. Mots-clés : Joueurs artificiels de Go, Exploration-exploitation, UCT, Monte-Carlo, Motifs in ria -0 01 17 26 6, v er si on 3 20 D ec 2 00 6 Modification of UCT with Patterns in Monte-Carlo Go 3
منابع مشابه
Exploration exploitation in Go: UCT for Monte-Carlo Go
Algorithm UCB1 for multi-armed bandit problem has already been extended to Algorithm UCT which works for minimax tree search. We have developed a Monte-Carlo program, MoGo, which is the first computer Go program using UCT. We explain our modifications of UCT for Go application, among which efficient memory management, parametrization, ordering of non-visited nodes and parallelization. MoGo is n...
متن کاملTo UCT, or not to UCT? (Position Paper)
Monte-Carlo search is successfully used in simulation-based planning for various large-scale sequential decision problems, and the UCT algorithm (Kocsis and Szepesvári 2006) seems to be the choice in most (if not all) such recent success stories. Based on some recent discoveries in theory and empirical analysis of Monte-Carlo search, here we argue that, if online sequential decision making is y...
متن کاملTowards Generalizing the Success of Monte-Carlo Tree Search beyond the Game of Go
Monte-Carlo Tree Search and specifically the variants of the UCT algorithm have been a break-through in AI of the board game Go. However, UCT has had limited applicability to other domains. We study the limitations of some of the existing variants of UCT in a small-scale Markov decision process (MDP), and propose new variants that can reduce those limitations. Our experiments show great improve...
متن کاملAdding Expert Knowledge and Exploration in Monte-Carlo Tree Search
We present a new exploration term, more efficient than classical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classical online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in...
متن کاملGeneralized Rapid Action Value Estimation
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for many games including the game of Go and General Game Playing (GGP). The standard algorithm for MCTS is Upper Confidence bounds applied to Trees (UCT). For games such as Go a big improvement over UCT is the Rapid Action Value Estimation (RAVE) heuristic. We propose to generalize the RAVE heuristic so as to have more accurate es...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006